4. DynamoDB Basics

24. Terminology Comparision with SQL#

mandatory attribute - partition key
optional attribute - sort key | range key

반드시 하나의 Primary key 를 가지고 있어야 하는이유

cost 해 보이지만, 결국에는 굉장한 퍼포먼스를 만들어 낼 수 있게 된다.

25. DynamoDB Tables and Naming Conventions#

많은 DBMS 에서 여러개의 DB 와 각 DB 들은 여러 TABLE 을 가지고 있을 것입니다.

DynamoDB Tables#

Multiple Database 의 개념이 없습니다.
테이블들은 Top level Entities 입니다.
테이블들은 AWS region 을 가지지만, 하나의 Database 로 보입니다.

Table Naming Conventions#

Prefix table names to create namespaces
prefix.tablename or prefix_tablename
e.g. test.user, test.projects or test_users, test_projects
Not mandatory, yet a good practice to follow.

당신이 테이블을 빠르게 발견하기 위해 도움이 될 것입니다.

Top level Entities#

Different 한 AWS regions 는, 각각 테이블을 소유하고 있을 것 입니다.
테이블 또한 위치가 따로 나뉘어져 있을 수 있습니다.
강의 에서는 DynamoDB 가 오직 하나의 region 에 존재한다고 가정합니다.
region 간 테이블에 대해 언급할 때마다 명시적으로 언급합니다.
테이블은 top level entities 이기 때문에, 같은 이름의 테이블을 가질 수 없습니다.

Independent Entities#

테이블은 독립적인 Entity 입니다.
외래키 관계가 존재하지 않습니다.
DynamoDB 는 relationship 을 강제하지 않습니다.
관계가 엄격하지 않는것이 한계처럼 보이지만, 테이블 쿼리를 매우 효율적으로 만들어 줄 수 있습니다.
join 없는 동시에, 각 테이블에 적절한 Capacity 를 가능하게 해줍니다.
각 테이블에 대한 성능을 예측할 수 있게 해줍니다.

Flexible Schema#

RDB 의 ACID 는 flexibility 를 잃는 것에 대한 비용을 야기합니다.

Different Table Items 은 Different Attributes 를 가질 수 있습니다.
Item 을 대표하는 Common Attribute 는 테이블의 Primary key 입니다.
DynamoDB Table 을 만들 때, Attribute 를 모두 정의하지 않아도 됩니다.
Table 을 생성할 때, Primary Key 와, 선택적으로 필요한 Local Indexes 를 정의하면 됩니다.

26. Data Types in DynamoDB#

Scalar Types#

Exactly One Value
e.g. string, number. binary, boolean, and null
Keys or Index attributes only support string, number and binary scalar types

String#

Stores text data (UTF-8 encoded)
Only non-empty values
Maximum : 4백만 KB
String 을 인덱스의 부분으로서 사용한다면 2KB 미만이어야 합니다. (Item 의 전체 레코드의 사이즈도 존재하기 때문)
String 을 Primary key 로 사용한다면 1KB 미만이어야 합니다.
String 을 Sort Key 로 사용한다면 1KB 미만이어야 합니다.
e.g. "John", "California", "Fox in Socks"

Number#

Stores all numeric types
e.g. 123, 100.88, -5, 0
API 로 전달될 때엔 String 으로 전달되지만 계산에 의해 Number 로 취급 됩니다.

Boolean#

true or false

Binary#

Blobs of binary data
e.g. compressed text, encrypted data, images etc
Only non-empty values
e.g.
- "QmFzZTYOIGVu..."
인덱스로 사용한다면 2KB 미만 이어야 합니다.
Hash Key 로 사용한다면 1KB 미만 이어야 합니다.
Sort Hash Key 로 사용한다면 제한이 없습니다.
파티션키로 사용한다면 ...?

Null#

Unknown or undefined state

Set Types#

Multiple scalar values
Unordered collection of strings, numbers of binary
e.g string set, number set and binary set
Only non-empty values
No duplicates allowed
No empty sets allowed
All values must be of same scalar type

Document Types#

Complex structure with nested attributes
e.g list and map
Nesting up to 32 leveles deep
Only non-empty values within lists and maps
Empty lists and maps are allowed

List#

Ordered collection of values
Can have multiple data types
e.g. ["John", 128.88, "Apples"]

Maps#

Unordered collection of Key-Value pairs
Ideal of storing JSON documents
e.g.

{
  name: "John",
  age: 22,
  address: {
    city: "Stamford",
    state: "Connecticut"
  }
}

DynamoDB 는 JSON 을 사용하여 유저와 상호작용할 수 있습니다.
그러나 실제로 JSON 데이터를 저장하지는 않습니다.
DynamoDB 의 데이터 타입들은 JSON 의 superset 입니다.

27. DynamoDB Consistency Model#

AWS Infrastructure#

Automatic Synchronous Replication#

만약 Facility 가 Failure/DownTime 이 발생하여도,
DynamoDB 는 Consistent Performance 와 Scale 를 제공합니다.

DynamoDB Read Consistency#

Strong Consistency#

The most up-to-date data
Must be requested explicitly

Eventual Consistency#

May or may not reflect the latest copy of data
- 오직, 데이터가 최근 2초나 3초 이내에 쓰여졌을 때 발생할 수 있다.
Default consistency for all operations (without Strong Consistency)
50% cheaper than, Strong Consistency

28. DynamoDB Capacity Units#

DynamoDB Tables#

Top-level entities
No Strict inter-table relationships
Mandatory primary keys
Control performance at the table level

DynamoDB Table 은 각각 Independent 하기 때문에, 테이블마다 Performance 를 제어하고 튜닝할 수 있습니다.

Throughput Capacity 처리량#

DynamoDB 처리량에서 생성하는 각 테이블에 대한 처리량 용량을 프로비저닝해야 합니다.

Scale 시, 성능을 예측 가능하게 해줍니다. (Allows for predictable performance at scale)
Read/Write 처리량 제어를 하기 위해 사용됩니다.
Auto-scaling 을 지원합니다.
사용하는 RCUs 와 WCUs 를 정의 합니다.
DynamoDB 가격에서 주요 요인입니다
1 Capacity Unit = 1 Request/sec

RCU#

Read Capacity Units
1 RCU = 1 strongly consistent table read/sec
- (data will get passed on to all the replicas as soon as a write request comes to one of the replicas of the database.)
- (get delayed)
1 RCU = 2 eventually consistency table reads/sec
- (Eventual consistency offers low latency at the risk of returning stale data)
In blocks of 4KB

WCU#

Write capacity unit
1 WCU = 1 table write/sec
In blocks of 1KB

Example#

Average Item Size: 10KB
Provisioned Capacity: 10 RCUs and 10 WCUs
Read throughput with strong consistency = 4KB x 10 = 40KB/sec
Read throughput = 2 (4KB x 10) = 80KB/sec
Write throughput = 1KB x 10 = 10KB/sec
RCUs to read 10KB of data per second with string consistency = 10KB/4KB = 2.5 => rounded up => 3 RCUs
RCUs to read 10KB of data per second = 3 RCUs x 0.5 = 1.5 RCUs
WCUs to write 10KB of data per second = 10KB/1KB = 10 WCUs
WCUs to write 1.5KB of data per second = 1.5KB/1KB = 1.5 => rounded up -> 2 WCUs

Burst Capacity#

추가적인 Bursts or Spikes Capacity 를 제공합니다.
어플리케이션이 프로비저닝된 용량을 초과하여 계속해서 증가하는 경우 DynamoDB 는 요청 제한을 시작합니다.
DynamoDB는 임시 Spikes 및 Bursts 동안, 사용하기 위해 최대 5분의 미사용 read/write capacity 를 유지합니다.

Scaling#

Scaling 은 비동기적으로, Downtime 없이, 백그라운드로 일어납니다.
- Scaling Up: As and when needed
- Scaling Down: Up to 4 times in a day
Capacity 변경은 테이블의 파티션 수와도 관련이 있으며 초기 Capacity 를 선택하고 변경하기 전에 염두에 두는 것이 중요합니다.
1 partition supports up to 1000 WCUs or 3000 RCUs

29. DynamoDB On-Demand Capacity#

AWS 는 현재 On-Demand Capacity mode for DynamoDB. 를 지원합니다.
앞선 강의에서 다룬 provisioned capacity mode 에서 추가적인 것입니다.

On-demand capacity mode#

DynamoDB는 애플리케이션이 테이블에서 수행하는 데이터 읽기 및 쓰기에 대해 요금을 부과합니다.
DynamoDB는 워크로드가 증가하거나 감소할 때 즉시 수용하므로 애플리케이션이 수행할 것으로 예상되는 읽기 및 쓰기 처리량을 지정할 필요가 없습니다.

Best if you#

작업부하를 알수 없는 테이블을 생성할 때
트래픽을 예측할 수 없는 어플리케이션
사용한 만큼만 지불하는 간편함을 선호할 때

30. Basics of DynamoDB Partitions#

DynamoDB 테이블을 효율적이고 비용효과적으로 설계하려면, DynamoDB 의 내부동작을 이해할 필요가 있습니다.

Partition 의 이해는 중요합니다.

DynamoDB Partitions Overview#

Store DynamoDB table Data
A table have multiple partitions
Number of table partitions depend on its size and provisioned capacity
Managed internally by DynamoDB
1 partition = upto 10GB of data
1 partition = upto 1000 WCUs or 3000RCUs
additional partition happens in background and without downtime.

Partition Behavior - Example#

Provisioned Capacity: 500 RCUs and 500 WCUs
Number of Partitions,
- P_T = (500 RCUs/3000 + 500 WCUs/1000) = 0.67 => rounded up => 1 partition
New Capacity: 1000 RCUs and 1000 WCUs
- P_T = (1000 RCUs/3000 + 1000 WCUs/1000) = 1.33 => rounded up => 2 partition

31. Basics of DynamoDB Indexes#

DynamoDB 가 데이터를 파티션에 저장하는 방법을 다룹니다.

Table Index#

Mandatory Primary Key - Either simple or composite
Simple Primary Key => Only Partition or Hash Key
Composite Primary Key => Partition Key + Sort or Range Key
Partition or Hash Key decided the target partition.

인덱스는 쿼리를 매우 빠르게 해줍니다. DynamoDB 는 Primary key 기반으로, 인덱스를 내부적으로 생성합니다.

Hash Algorithm | Hash Function#

파티션 키의 Hash 를 계산할 때 사용합니다.
파티션 키의 값은 Item 이 저장되어야 하는 파티션을 결정하는 데 사용됩니다.
sort key 값에 따라 정렬된 동일한 파티션 키 값을 가진 모든 항목들을 물리적으로 서로 가깝게 저장합니다.
파티션 키를 지정하지 않고 테이블 데이터를 쿼리할 수 있는 방법은 없습니다.
따라서 항목을 읽거나 테이블에서 항목을 쿼리할 때마다 파티션 키를 지정해야 합니다.
그런 다음 dynamoDB는 파티션 키의 해시를 계산하여 항목의 위치를 식별한 다음 요청된 데이터를 우리에게 반환합니다.
물론 파티션 키를 지정할 필요가 없는 스캔 가능한 작업을 수행할 수 있습니다.
- Scan operation 의 사용이 정말로 필요할 경우만 사용해야 한다.
- Scan operation 은 가능한 사용을 안하는 것이 좋습니다.

32. Local Secondary Indexes and Global Secondary Indexes#

Local Secondary Index#

부서가 IT 이고, DoJ 를 기준으로 sort 를 하는 쿼리를 사용하면 어떡할까요?

IT 부서인 모든 직원을 query 해야 합니다.
비효율 적인 방법 입니다.

Doj 에 sort key 를 지정합니다.

EmpID 가 Primary sort key 일 때에와 비교하자면,

Dept 가 Primary Partition Key 인 것은 동일합니다.
이 때, DoJ 에게 Secondary Sort Key 를 부여하게 되었는데요, 이것을 Local Secondary Index 라고 부릅니다.

Local Secondary Index

Local Secondary Index 는 테이블을 생성할 때 만들 수 있다.
테이블당 5개까지 만들 수 있다.
테이블의 프로비전된 RCU WCU 는 Local Secondary Index 와 공유 됩니다.
- Local Secondary Index 를 사용하여 Eventually/Strongly consistency 쿼리를 수행할 수 있습니다.

Global Secondary Index#

예제에서, Location 이 NYC 인 직원을 DoJ 정렬한 결과를 얻고자 합니다.

partition key 에 의존하지 않는 데이터 입니다.
추가적인 Local Secondary Index 를 정의할 수도 없습니다.

Location 을 파티션키로 지정하고, DoJ 를 sort key 로 정의했습니다.

Global Secondary Index

언제든 생성 가능
Global Secondary Index 만의 파티션에 Global Secondary Index 를 저장합니다.
Global Secondary Index 만의 Throughput Capacity 설정이 존재합니다.
- Table 과 공유되지 않은, RCU WCU 를 정의할 수 있습니다.
Eventually Consistent Reads 만 가능합니다.
Strong Consistent Reads 는 지원하지 않습니다.
중복된 아이템이 존재할 수 있다.
테이블에서 item 을 반환할 때, Global secondary index 는 비동기적으로 업데이트 됩니다. (?)

33. Interacting with DynamoDB#

Different Ways to work with DynamoDB#

AWS Management Console
AWS CLI
AWS DSK

24. Terminology Comparision with SQL#

25. DynamoDB Tables and Naming Conventions#

DynamoDB Tables#

Table Naming Conventions#

Top level Entities#

Independent Entities#

Flexible Schema#

26. Data Types in DynamoDB#

Scalar Types#

String#

Number#

Boolean#

Binary#

Null#

Set Types#

Document Types#

List#

Maps#

27. DynamoDB Consistency Model#

AWS Infrastructure#

Automatic Synchronous Replication#

DynamoDB Read Consistency#

Strong Consistency#

Eventual Consistency#

28. DynamoDB Capacity Units#

DynamoDB Tables#

Throughput Capacity 처리량#

RCU#

WCU#

Example#

Burst Capacity#

Scaling#

29. DynamoDB On-Demand Capacity#

On-demand capacity mode#

Best if you#

30. Basics of DynamoDB Partitions#

DynamoDB Partitions Overview#

Partition Behavior - Example#

31. Basics of DynamoDB Indexes#

Table Index#

Hash Algorithm | Hash Function#

32. Local Secondary Indexes and Global Secondary Indexes#

Local Secondary Index#

Local Secondary Index

Global Secondary Index#

Global Secondary Index

33. Interacting with DynamoDB#

Different Ways to work with DynamoDB#

Refs#